Ear & Hearing — Latest Matching Preprints

1

Repolarisation Speed May Vary with Characteristic Frequency in Human Spiral Ganglion Cells: Preliminary Observation from Electrically Evoked Compound Action Potentials

Lien, J. T.-H.; Strahl, S.; Garcia, C.; Vickers, D.

2026-04-24 otolaryngology 10.64898/2026.04.23.26351590 medRxiv

Top 0.1%

27.9%

Show abstract

The human auditory system decomposes complex sounds into distinct components via a collection of processing steps. Knowing whether Spiral Ganglion Cells (SGCs) play an active role in the decoding of complex sounds can facilitate the development of Cochlear Implant (Cl) coding strategies and clinical assessment tools. Early animal studies reported SGCs being similar across different characteristic frequencies (CFs). In this study, human electrically evoked compound action potentials (eCAPs) were analysed to probe the relationship between the reciprocal of CF and the duration of the eCAP. A significant relationship could indicate that SGCs may not simply be passive cables. eCAP datasets from 6 published studies (175 Cl users, 1243 recordings) were analysed and their peaks were automatically labelled. The nlp2 latency was derived for each recording as a proxy of the action potential duration. The CF of each recording was estimated by mapping the average insertion angle of the electrode to the human SGC map. A weak but statistically significant relationship was observed between the n1p2 latency and the reciprocal of CF (random-effects model with random intercepts for subject, r = 0.09, p = 0.024, n= 450) supporting the hypothesis that lower CF is associated with slower repolarisation (longer n1p2 latency) in human spiral ganglion cells.

2

Development and clinical application of a consonant confusion task to evaluate hearing aid benefit

Hajicek, J.; Harris, S. E.; Neely, S. T.

2026-04-24 otolaryngology 10.64898/2026.04.23.26351598 medRxiv

Top 0.1%

27.9%

Show abstract

PurposeThis research sought to develop a low-cognitive-load speech-in-noise test based on consonant confusions with the potential for assessing hearing-aid benefit. MethodsVowel-consonant-vowel (VCV) stimuli with added speech-shaped noise were presented as a closed-set consonant identification task. Initially, consonant-confusion matrices were used to select, from a larger set of consonants and vowel contexts, a set of ten consonants and associated signal-to-noise ratios (SNR) that were sensitive to hearing loss. The sensitivity of the qVCV test to hearing loss was validated by comparing predicted pure-tone average (PTA) hearing thresholds with their audiometric PTA. Clinical viability of the qVCV test was assessed by comparisons to the QuickSIN test. Hearing-aid benefit was assessed by comparing test scores in unaided and aided conditions. ResultsThe consonants most sensitive to hearing loss were /b d g t k v z s [esh] n/ in the vowel context /[a]/. A cross-validated prediction of PTA had a mean-absolute error of 5.7 dB. The repeatability of qVCV at 50 trials was equivalent to the QuickSIN average of two lists. Hearing-aid benefit was quantified as a decibel reduction in hearing loss. ConclusionsqVCV and QuickSIN performed similarly when test times are equated. The advantages of qVCV include lower cognitive demand, fewer learning effects, and automated scoring. PTA predicted by qVCV which greatly exceeds audiometric PTA may indicate either cognitive deficits or cochlear neural degeneration. The qVCV quantification of hearing-aid benefit may have clinical value.

3

Can Multimodal Large Language Models Visually Interpret Auditory Brainstem Responses?

Jedrzejczak, W.; Kochanek, K.; Skarzynski, H.

2026-04-17 otolaryngology 10.64898/2026.04.15.26350944 medRxiv

Top 0.1%

23.7%

Show abstract

IntroductionAuditory brainstem response (ABR) is a standard objective method for estimating hearing threshold, especially in patients who cannot reliably participate in behavioral audiometry. However, ABR interpretation is usually performed by an expert. This study evaluated whether two general-purpose artificial intelligence (AI) multimodal large language model (LLM) chatbots, ChatGPT and Qwen, can accurately estimate ABR hearing thresholds from ABR waveform images. The accuracy was measured by comparisons with the judgements of 3 expert audiologists. MethodsA total of 500 images each containing several ABR waveforms recorded at different stimulus intensities were analyzed. Three expert audiologists established the reference auditory thresholds based on visual identification of wave V at the lowest stimulus intensity, with the most frequent judgment among the three used as the reference. Each waveform image was independently submitted to ChatGPT (version 5.1) and Qwen (version 3Max) using the same standardized prompt and without additional clinical context. Agreement with the expert thresholds was assessed as mean errors and correlations. Sensitivity and specificity for detecting hearing loss (>20 dB nHL) were also calculated. In cases where the AI and expert thresholds nominally matched, corresponding latency measures were also compared. ResultsAuditory thresholds derived from both LLMs correlated strongly with expert opinion, with Pearson r = 0.954 for ChatGPT and r = 0.958 for Qwen. ChatGPT showed a mean error of +5.5 dB and Qwen showed a mean error of -2.7 dB. Exact nominal agreement with expert values was achieved in 34.6% of ChatGPT estimates and 35.6% of Qwen estimates; agreement within {+/-}10 dB was observed in 75.6% and 80.0% of cases, respectively. For hearing-loss classification, ChatGPT achieved 100% sensitivity but low specificity (20.4%), whereas Qwen showed a more balanced profile with 91.6% sensitivity and 67.5% specificity. Curiously, estimates of wave V latency were markedly poor for both LLMs, with systematic underestimation and weak correlations with the expert judgements. ConclusionChatGPT and Qwen demonstrated a moderate ability to estimate ABR thresholds from waveform images, although their performance was not good enough for independent clinical use. Both models captured general patterns of hearing loss severity, but there was systematic bias, limited specificity and sensitivity balance, and poor latency estimation. General-purpose multimodal LLMs may have potential as assistive or preliminary tools, but clinically reliable ABR interpretation will likely require specialized, domain-trained AI systems with expert oversight.

4

Multivariate Prediction of Conductive Dysfunction in Well and NICU Newborns using Wideband Acoustic Immittance with Acoustic Reflex Tests

Hunter, L. L.; Feeney, M. P.; Fitzpatrick, D.; Keefe, D. H.

2026-03-15 otolaryngology 10.64898/2026.03.13.26348314 medRxiv

Top 0.1%

18.8%

Show abstract

ObjectivesThe overall goal of this study was to assess tympanometric and ambient wideband acoustic immittance (WAI) tests and wideband acoustic reflex thresholds (ART) in well-baby and newborn intensive care (NICU) cohorts with three specific objectives: 1) Assess predictive accuracy for WBT and ART for conductive dysfunction in ears referring on the first or second stages of newborn hearing screening; 2) Identify inadequate tests likely due to probe blockages or leaks; and 3) Assess prediction models separately for well-baby and NICU screening outcomes. DesignProspective, observational study of full-term (n=514) and premature newborns (n=239) recruited from well-baby and NICU nursery birth hospital newborn hearing screening program. Wideband tympanometry, ambient absorbance, and acoustic reflexes were tested after Stage 1 transient otoacoustic emissions (TEOAE) screening. The reference standard for Pass or Refer groups was initially defined on the stage 1 TEOAE test result. Pass or Refer groups were then reassigned based on the stage 2 screening ABR for those who referred at Stage 1, and all NICU infants. Multivariate models were developed using reflectance and admittance variables to predict conductive dysfunction relative to the screening reference standard in a randomized sub-group of subjects at Stage 1 and Stage 2 screening. Classification accuracy was evaluated on a second, independent sub-group. Individual tests were classified as having inadequate probe fits if they had excessively low values of sound pressure level or susceptance (leak) or absorbance (blockage). ResultsDifferences in ambient absorbance for Pass v. Refer screening groups revealed the greatest differences and effect sizes occurring in frequency bins between 1.4-2 kHz. Screening failure at both Stage 1 and 2 was most accurately predicted by models using ambient absorbance and power level variables at frequencies between 1-2.8 kHz, including ARTs. Tympanometric admittance variables at the positive-pressure tail for frequencies between 1-2.8 kHz in combination with the ART were more accurate predictors than those at peak pressure or the negative-pressure tail. Multivariate models generalized well to an independent group of infants at both Stage 1 and 2 for both the ambient and tympanometric models. Ambient tests revealed more inadequate tests than tympanometric tests, primarily due to blocked probe tips. Exclusion of ears to detect probe leaks or blockages slightly improved the ambient prediction models, but did not affect tympanometric models. ConclusionWideband acoustic reflex tests improved all models for ambient and tympanometric absorbance. Multivariate prediction models developed for WAI tests were repeatable in an independent group of well and NICU infants, suggesting that the results are generalizable to these populations. Detection of probe blockage or leaks slightly improved prediction for ambient measures. Pressurized tests have the advantage of ensuring probe seals due to the need for a hermetic seal, thus are useful to ensure adequate probe insertion.

5

Early life factors documented in electronic health records predict recurrent acute otitis media

Hurst, J. H.; Zhao, C.; Raynor, E. M.; Lee, J.; Gitomer, S. A.; Woods, C. W.; Kelly, M. S.; Smith, M. J.; Goldstein, B. A.

2026-03-09 pediatrics 10.64898/2026.03.07.26347843 medRxiv

Top 0.1%

15.1%

Show abstract

Background and ObjectivesRecurrent acute otitis media (rAOM; defined as [≥]3 AOM episodes in 6 months or [≥]4 episodes in 12 months) affects 10-15% of children in the United States and is a leading cause of healthcare utilization and antibiotic prescriptions. Prospective identification of children at risk of rAOM could help target interventions and identify new risk factors to guide preventive approaches. We therefore sought to develop predictive models to identify children at risk of rAOM using electronic health records (EHR) data. MethodsWe extracted retrospective EHR data for children who were born at Duke University Health System (DUHS) hospitals between January 1, 2014, and June 30, 2022, and who had at least one AOM episode during the study period. We used LASSO to build predictive models for development of rAOM at each episode and identified factors associated with rAOM. ResultsWe identified 6,566 children who met the study criteria, including 1,634 (24.8%) who met criteria for rAOM. A model using only data available at the first AOM episode had an area under the curve (AUC) of 0.75 (0.73, 0.77) and an Area Under the Precision Recall Curve (AUPRC) of 0.41 (95% CI 0.37, 0.46), indicating moderate discriminative ability. At the time of the first AOM episode, features associated with subsequent rAOM development included age, number of prior antibiotic prescriptions, and diagnosis of gastroesophageal reflux disease (GERD). Further, children who developed rAOM were more likely to experience treatment failure than children who did not meet rAOM criteria across all episodes. ConclusionsOur findings indicate that clinical exposures and patient characteristics documented in the EHR distinguish children who are at risk of developing rAOM. Such models could be deployed within EHR systems to identify children who would benefit from early evaluation by an otolaryngologist and audiologist.

6

Improving Automated Diagnosis of Middle and Inner Ear Pathologies by Estimating Middle Ear Input Impedance from Wideband Tympanometry

Kamau, A. F.; Merchant, G. R.; Nakajima, H. H.; Neely, S. T.

2026-03-31 otolaryngology 10.64898/2026.03.26.26349034 medRxiv

Top 0.1%

12.5%

Show abstract

Conductive hearing loss (CHL) with a normal otoscopic exam can be difficult to diagnose because routine clinical measures such as audiometric air-bone gaps (ABGs) can identify a conductive component but often cannot distinguish among specific underlying mechanical pathologies (e.g., stapes fixation versus superior canal dehiscence, which may produce similar audiograms). Wideband tympanometry (WBT) is a fast, noninvasive test that can provide additional mechanical information across a broad range of frequencies (200 Hz to 8 kHz). However, WBT metrics are influenced by variations in ear canal geometry and probe placement and can be challenging to interpret clinically. In this study, we extend prior WBT absorbance-based classification work by estimating the middle ear input impedance at the tympanic membrane (ZME), a WBT-derived metric intended to reduce ear canal effects. To estimate ZME, we fit an analog circuit model of the ear canal, middle ear, and inner ear to raw WBT data collected at tympanometric peak pressure (TPP). Data from 27 normal ears, 32 ears with superior canal dehiscence, and 38 ears with stapes fixation were analyzed. A multinomial logistic regression classifier was trained using principal component analysis (retaining 90% variance) and stratified 5-fold cross-validation with regularization. We compared feature sets based on ABGs alone, ABGs combined with absorbance, and ABGs combined with the magnitude of ZME. The combination of ABGs and the magnitude of ZME produced the best performance, achieving an overall accuracy of 85.6% compared to 80.4% for ABGs alone and 78.4% for ABGs combined with absorbance. These results suggest that incorporating model-derived middle ear impedance features with standard audiometric measures (ABGs) can improve automated pathology classification for stapes fixation and superior canal dehiscence.

7

Effects of Aging, Hearing Loss, and Co-Activation on the Middle Ear Muscle Reflex and Medial Olivocochlear Reflex

Devolder, P.; Deloche, F.; Thienpont, M.; Keppler, H.; Verhulst, S.

2026-04-28 otolaryngology 10.64898/2026.04.27.26351829 medRxiv

Top 0.1%

12.2%

Show abstract

The middle ear muscle reflex (MEMR) and medial olivocochlear reflex (MOCR) are increasingly studied for their role in suprathreshold auditory processing. However, recording these reflexes in humans is potentially complicated by age-related (sub)clinical hearing loss and co-activation. This study investigates (1) the influence of age-related (sub)clinical hearing loss, (2) methodological differences between conventional and wideband MEMR techniques, and (3) how MEMR activation contaminates MOCR recordings. Three test groups were included: young normal-hearing adults, middle-aged normal-hearing adults, and middle-aged adults with audiometric hearing loss. Cochlear status and neural encoding was assessed using distortion-product otoacoustic emissions (DPOAEs) and envelope following responses (EFRs). MEMR recordings were compared using conventional tonal stimuli and wideband stimuli. MOCR was recorded at elicitor levels of 60 and 75 dB to evaluate MEMR co-activation. MEMR was related to age, suggesting sensitivity to subclinical cochlear damage. Wideband stimuli were beneficial as elicitor (noise vs. tone), while changing the probe stimuli added no significant benefit (click vs. tone). MOCR strength did not correlate with age-related subclinical hearing, suggesting that MOCR measurements may reflect efferent function relatively independently of afferent sensorineural status in audiometric normal hearing subjects. However, reliable recordings were challenging in participants with audiometric hearing loss due to poor OAE baselines. MEMR co-activation was detectable in the click response and could alter MOCR-induced suppression. These findings suggest that, in cases of normal hearing thresholds, MEMR amplitude may be a marker of subclinical cochlear damage and MOCR measurements may more specifically reflect efferent function. Clinical measurements can be improved using broadband stimuli, accounting for outer-hair-cell damage, and defining criteria for reflex co-activation.

8

Speech-in-Noise Difficulties in Aminoglycoside Ototoxicity Reflects Combined Afferent and Efferent Dysfunction

Motlagh Zadeh, L.; Izhiman, D.; Blankenship, C. M.; Moore, D. R.; Martin, D. K.; Garinis, A.; Feeney, P.; Hunter, L. R.

2026-03-26 otolaryngology 10.64898/2026.03.23.26348719 medRxiv

Top 0.1%

10.3%

Show abstract

Objectives: Patients with Cystic fibrosis (CF) often receive aminoglycosides (AGs) to manage recurrent pulmonary infections, placing them at risk for ototoxicity. Chronic AG use can lead to complex cochlear damage affecting inner and outer hair cells, the stria vascularis, and spiral ganglion neurons. The greatest damage is typically in the basal cochlear region, which encodes high-frequency hearing, with additional involvement of more apical regions. While extended-high-frequency (EHF) hearing loss (EHFHL; 9-16 kHz) is often the earliest sign of AG ototoxicity, speech in noise (SiN) effects are rarely studied. Our overall hypothesis is that SiN perception difficulties in individuals with CF, treated with AGs, are related to combined cochlear and neural damage, primarily in the EHF range but also in the standard frequency (SF; 0.25-8 kHz) range. Three mechanisms that contribute to SiN perception were evaluated in children and young adults: 1) a primary effect of reduced EHF sensitivity, measured by pure-tone audiometry (PTA) and transient-evoked otoacoustic emissions (TEOAEs); 2) a secondary effect of subclinical damage in the SF range, measured by PTA and TEOAEs; and 3) additional neural effects, measured by middle ear muscle reflex (MEMR) threshold (afferent) and growth functions (efferent).Design:A total of 185 participants were enrolled; 101 individuals with CF treated with intravenous AGs and 84 age and sex-matched Controls without hearing concerns or CF. Assessments included EHF and SF PTA; the Bamford-Kowal-Bench (BKB)-SIN test for SiN perception; double-evoked TEOAEs with chirp stimuli from 0.71 to 14.7 kHz; and ipsilateral and contralateral wideband MEMR thresholds and growth functions using broadband stimuli. Results: Reduced sensitivity at EHFs (PTA, TEOAEs) was not associated with impaired SiN perception in the CF group. SF hearing, regardless of EHF status, was the primary predictor of SiN performance in the CF group. Increased MEMR growth was also significantly associated with poorer SiN in the CF group. Conclusions: In CF, impaired SiN perception was primarily predicted by SF hearing impairment, with additional involvement of the efferent auditory pathway through increased MEMR growth. These results build on prior evidence for efferent neural effects due to ototoxic exposures, supporting both sensory (afferent) and neural (efferent) mechanisms that contribute to listening difficulties in CF. Thus, preventive and intervention strategies should consider these combined mechanisms in people with AG ototoxicity to address their SiN problems.

9

Incidence and Severity of Carboplatin-Associated Hearing Loss in Children with Cancer Assessed by the SIOP 2012 Ototoxicity Criteria

Chawla, A.; Carter, S.; Wood, A.; Staffieri, S.; Dodgshun, A.; Eisenstat, D.; Sullivan, M.

2026-05-30 pediatrics 10.64898/2026.05.21.26353442 medRxiv

Top 0.1%

10.3%

Show abstract

Background: Platinum-based chemotherapy is known to cause severe and debilitating hearing loss, but unlike cisplatin, the true incidence of carboplatin-induced hearing loss remains unclear. We evaluated functional hearing outcomes in children receiving carboplatin to determine the incidence and severity of ototoxicity. Procedure: We identified a large cohort of children with cancer treated with carboplatin and graded their audiograms using the SIOP ototoxicity scale. Patients with inadequate audiological follow-up, prior hearing loss, or exposure to cisplatin were excluded. Fishers exact test, logistic regression, and ROC analyses were performed to investigate associations of demographic, treatment, and exposure-related risk factors with incidence of hearing loss. Results: 200 patients were included, all of whom had been treated with carboplatin. Only nine (4.5%) patients developed clinically significant hearing loss (SIOP grade [≥]2). Younger age at first exposure to carboplatin was the only significant predictor of hearing loss (OR = 0.7888, p=0.0241). Age [≤]28 months was significantly associated with hearing loss (OR 12.37, p=0.0042). No other risk factors or exposures were statistically significant. Conclusions: Clinically significant carboplatin-associated hearing loss was uncommon (incidence 4.5%). We show that young age is the single-most important risk factor for hearing loss; of nine children who developed hearing loss, eight were aged [≤]28 months. Children below this age have twelve-fold higher odds of developing hearing loss compared to those above this age (OR 12.37). These findings will allow physicians to provide more appropriate counselling to families regarding ototoxic risk and support intensified hearing surveillance in young children.

10

Differentiating the Physiological Signatures of Cochlear Synaptopathy and Inner Hair Cell Damage in a Chinchilla Model

Sivaprakasam, A.; Schweinzger, I.; Heinz, M.

2026-05-08 neuroscience 10.64898/2026.05.05.723072 medRxiv

Top 0.1%

7.1%

Show abstract

Aging and noise over-exposure lead to complex mixtures of cochlear degradation that impair the structure and function of outer hair cells, inner hair cells (IHCs), and the cochlear nerve. However, IHC damage and cochlear synaptopathy (CS) remain pathologies "hidden" from the audiogram. This study aimed to identify and differentiate the physiological signatures of these two distinct pathologies using promising non-invasive assays: Envelope Following Responses (EFRs), Auditory Brainstem Response (ABRs), Wideband middle-ear reflexes (WB-MEMRs), and Distortion Product Otoacoustic Emissions (DPOAEs). We utilized chinchilla models of carboplatin-induced (CA) IHC damage (N = 4) and temporary threshold shift (TTS) noise-induced CS (N = 4) to compare the physiological signatures of each pathology. While both groups showed unchanged ABR thresholds two weeks after exposure, EFRs, ABR Wave V/I ratios, and MEMRs showed distinct effects of exposure. Despite non-elevated ABR-derived audiometric thresholds after exposure, both CA and TTS exposure resulted in severe in EFR "peakiness", particularly for sharp, short-duty-cycle stimuli and significant elevations in ABR Wave V/I ratios. However, these findings were less-pronounced in the TTS-exposed animals. WB-MEMR amplitudes were decreased with elevated thresholds in both groups; this effect was more pronounced in the TTS group. Opposite trends in DPOAE amplitudes indicated that while both IHC damage and CS result in similar suprathreshold temporal coding deficits, effects on outer-hair-cell integrity and auditory efferent physiology may differ between the two pathologies. Future work and novel diagnostics should aim to distinguish these specific cochlear pathologies in clinical populations, or at the very least consider their overlap. HighlightsO_LIA multi-metric diagnostic approach was used with chinchilla models of inner-hair-cell (IHC) damage and cochlear synaptopathy (CS). C_LIO_LIIHC damage and synaptopathy both cause suprathreshold deficits "hidden" from the audiogram. C_LIO_LIIHC damage results in more severe temporal envelope coding degradation than does synaptopathy. C_LIO_LIA combination of EFR "peakiness", ABR Wave V/I ratio, and Wideband Middle Ear Muscle Reflex (WB-MEMR) appear to be useful measures for profiling IHC damage and CS. C_LI

11

A Blinded Comparative Evaluation of Clinical and AI-Generated Responses to Otologic Patient Queries

Akinniyi, S.; Jain-Poster, K.; Evangelista, E.; Yoshikawa, N.; Rivero, A.

2026-04-15 otolaryngology 10.64898/2026.04.14.26350677 medRxiv

Top 0.1%

4.9%

Show abstract

ObjectiveThe objective of this study is to assess the quality, empathy, and readability of large language model (LLM) responses regarding otologic questions from patients as they compare to verified physician responses in other patient-driven forums. This study aims to predict the potential utility of LLMs in patient-centered communication. Study DesignComparative study SettingsInternet MethodsA sample of 49 otology-related questions posted on Reddit r/AskDocs1 between January 2020 and June 2025 were selected using search terms including "hearing loss," "ear infection," "tinnitus," "ear pain," and "vertigo." Posts were retrieved using Reddits "Top" filter. Each question was answered by a verified doctor on Reddit and three AI LLMs (ChatGPT-4o, ClaudeAI, Google Gemini). Responses were scored by five evaluators. ResultsCommon otologic concerns posed in patient questions were otalgia (38.7%), vertigo (28.6%), tinnitus (24.5%), hearing loss (22.4%), and aural fullness (20.4%). LLM responses were longer than physician responses (mean 145 vs 67 words; p < .05) and rated higher in quality (10.95 vs 9.58), empathy (7.26 vs 5.18), and readability (4.00 vs 3.73); (all p < .05). Evaluators correctly identified AI versus physician responses in 89.4% of cases with higher sensitivity for detecting physician responses (93.5%). By Flesch-Kincaid grade level, ChatGPT produced the most readable content (mean 7.25), while ClaudeAI responses were more complex (11.86; p < .05). ConclusionLLM responses received higher ratings in quality, empathy, and readability than those of physicians in response to a variety of otologic concerns. When appropriately implemented, such systems may enhance access to understandable otologic information and complement clinician-delivered care.

12

EEG correlates of auditory rise time processing: A systematic review

Manasevich, V.; Kostanian, D.; Rogachev, A.; Sysoeva, O.

2026-03-09 neuroscience 10.64898/2026.03.06.710012 medRxiv

Top 0.1%

4.5%

Show abstract

Rise time (RT) is considered to be one of the most significant acoustical characteristics of auditory speech stimuli. A substantial amount of data has been accumulated on the neurophysiological mechanisms of RT processing under different conditions and in different groups of people, but these data have not been systematised. This review focuses on studies that have investigated electroencephalographic (EEG) markers of RT sensitivity. The present literature search was conducted according to the PRISMA statement in PubMed, Web of Science and APA PsychInfo databases. The resultant review comprised 37 studies that considered diverse aspects of RT processing. The review describes the main stimulation parameters affecting electrophysiological markers of RT processing reflected in different components of event-related potentials, brainstem responses and cortical rhythmic activity. The main finding of this review is that the rise time prolongation leads to a decrease in the amplitude of the main ERP components and an increase in their latencies. However, the sensitivity of the EEG markers varied with the earliest components tracking the subtle difference (few tens of microseconds), while the later components coding the larger one (up to 500 ms). Nevertheless, the observed effects may vary and depend on some aspects of the experimental paradigm, age of participants and speech-related problems. Future research may benefit by addressing understudied clinical groups and ERP components such as P1 and N2, dominated in children.

13

Discrimination of spectrally sparse complex-tone triads in cochlear implant listeners

Augsten, M.-L.; Lindenbeck, M. J.; Laback, B.

2026-03-24 neuroscience 10.64898/2026.03.20.712905 medRxiv

Top 0.1%

4.3%

Show abstract

Cochlear implant (CI) users typically experience difficulties perceiving musical harmony due to a restricted spectro-temporal resolution at the electrode-nerve interface, resulting in limited pitch perception. We investigated how stimulus parameters affect discrimination of complex-tone triads (three-voice chords), aiming to identify conditions that maximize perceptual sensitivity. Six post-lingually deafened CI listeners completed a same/different task with harmonic complex tones, while spectral complexity, voice(s) containing a pitch change, and temporal synchrony (simultaneous vs. sequential triad presentation) were manipulated. CI listeners discriminated harmonically relevant one-semitone pitch changes within triads when spectral complexity was reduced to three or five components per voice, with significantly better performance for three-component compared to nine-component tones. Sensitivity was observed for pitch changes in the high voice or in both high and low voices, but not for changes in only the low voice. Single-voice sensitivity predicted simultaneous-triad sensitivity when controlling for spectral complexity and voice with pitch change. Contrary to expectations, sequential triad presentation did not improve discrimination. An analysis of processor pulse patterns suggests that difference-frequency cues encoded in the temporal envelope rather than place-of-excitation cues underlie perceptual triad sensitivity. These findings support reducing spectral complexity to enhance chord discrimination for CI users based on temporal cues.

14

Neural Correlates of Listening States, Cognitive Load, and Selective Attention in an Ecological Multi-Talker Scenario

Shahsavari Baboukani, P.; Ordonez, R.; Gravesen, C.; Ostergaard, J.; Rank, M. L.; Alickovic, E.; Cabrera, A. F.

2026-03-15 neuroscience 10.64898/2026.03.13.711289 medRxiv

Top 0.1%

3.6%

Show abstract

This study assessed neural responses to continuous speech to classify listening state, cognitive load, and selective auditory attention in complex acoustic environments. EEG was recorded while participants listened to concurrent male and female talkers under two conditions: active listening, where attention was directed to one of two competing speakers (target vs. masker), or passive listening, where attention was diverted to a visual task. Cognitive load was varied by manipulating target-to-masker (TMR) ratio (TMR: +7 dB, -7 dB), with lower TMR representing more demanding listening conditions. Spectral EEG features across frequency bands were ranked with univariate statistics and used to classify listening state (active vs passive) and cognitive load (low vs. high TMR). Auditory attention decoding (AAD) was performed using linear stimulus reconstruction to identify the target talker during active listening. Classification of listening state achieved 90.3% accuracy, and AAD reached 84.4% accuracy, demonstrating robust tracking of attentional engagement. In contrast, classification of cognitive load was near chance, suggesting that more extreme acoustic manipulations may be required to elicit distinct neural signatures. Comparable performance using a reduced set of electrodes near the ear indicates the potential for integration with wearable hearing devices. Overall, these results demonstrate that EEG can distinguish attentional states and selectively track target speech in realistic auditory scenarios. The findings provide a foundation for future applications in monitoring listening behavior, supporting auditory processing, and improving brain-controlled hearing aids in complex acoustic environments. HighlightsO_LIListening state (active vs. passive) can be classified from EEG spectral features. C_LIO_LIAttended speech can be decoded by reconstructing speech envelopes from EEG. C_LIO_LIComparable accuracy is achieved using only electrodes placed around the ears. C_LIO_LIEEG can monitor listening state and track auditory attention in two-speaker settings. C_LI Graphical AbstractEEG signals were recorded while participants listened to two concurrent speech streams, either by actively attending to one speaker or by focusing on an unrelated visual task. Spectral features of the EEG were used to classify listening state (active vs. passive) and cognitive load (low vs. high TMR). Auditory attention decoding (AAD) was performed by reconstructing the speech envelope from the EEG time signal. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/711289v1_ufig1.gif" ALT="Figure 1"> View larger version (32K): org.highwire.dtl.DTLVardef@1079628org.highwire.dtl.DTLVardef@1135404org.highwire.dtl.DTLVardef@1f0d950org.highwire.dtl.DTLVardef@14b4c9a_HPS_FORMAT_FIGEXP M_FIG C_FIG Classification of listening state (active vs. passive): 90.3% accuracy. EEG difference between active and passive listening. Left, power spectrum, right, topographic map (alpha band 8-12 Hz). Classification of cognitive load (low vs high TMR): near chance level. EEG difference between low and high TMR. Left, power spectrum, right, topographic map (alpha band 8-12 Hz). O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/711289v1_ufig2.gif" ALT="Figure 2"> View larger version (34K): org.highwire.dtl.DTLVardef@9229b1org.highwire.dtl.DTLVardef@1ef394corg.highwire.dtl.DTLVardef@9adecforg.highwire.dtl.DTLVardef@199f8c2_HPS_FORMAT_FIGEXP M_FIG C_FIG AAD achieved 84.4% accuracy, indicating robust decoding of the attended speaker during active listening, while performance dropped to near chance during passive listening.

15

Investigating neural speech processing with functional near infrared spectroscopy: considerations for temporal response functions

Wilroth, J.; Sotero Silva, N.; Tafakkor, A.; de Avo Mesquita, B.; Ip, E. Y. J.; Lau, B. K.; Hannah, J.; Di Liberto, G. M.

2026-03-23 neuroscience 10.64898/2026.03.20.713212 medRxiv

Top 0.1%

2.6%

Show abstract

Functional near infrared spectroscopy (fNIRS) is increasingly used in hearing and communication research, with advantages such as robustness to movement artifacts, improved spatial resolution, and flexibility of contexts in which it can be applied. At the same time, the field is progressively moving towards more continuous, naturalistic listening paradigms resulting in the widespread adoption of speech tracking analyses such as temporal response functions (TRFs) in electroencephalography (EEG) and magnetoencephalography (MEG) studies. However, it remains unclear whether these analyses can be applied to slower haemodynamic signals measured by fNIRS. In the present study, we investigated whether a TRF framework can similarly be applied to fNIRS data recorded during continuous speech perception. Eight participants listened to speech simultaneously while fNIRS signals were acquired in a hyperscanning setup. Speech features were regressed onto the haemodynamic responses to test the feasibility and interpretability of fNIRS-based TRFs. Prediction correlations between observed and modelled fNIRS signals across speech features were higher than those typically reported for EEG- and comparable to those reported for MEG-TRF studies. Moreover, these correlations did not overlap with a null distribution generated from triallJmismatched fNIRS data, confirming statistical significance and were slightly greater than those obtained from a conventional GLM approach. Our findings support that TRF estimation method can yield meaningful and statistically significant responses from fNIRS data. HighlightsO_LITRF modelling can be meaningfully applied to fNIRS data acquired during speech listening tasks. C_LIO_LIPrediction correlations between actual and modelled fNIRS signals were above chance level, with values comparable to previous EEG/MEG studies. C_LIO_LITRFs explained more fNIRS variance than a conventional GLM approach. C_LI

16

When feedback backfires: investigating neurofeedback effects in a closed-loop auditory attention decoding paradigm

Rotaru, I.; Geirnaert, S.; Heintz, N.; Bertrand, A.; Francart, T.

2026-04-30 neuroscience 10.64898/2026.04.28.721343 medRxiv

Top 0.1%

2.1%

Show abstract

Selective auditory attention decoding (AAD) enables tracking which of multiple concurrent speakers a listener attends to and is a key building block for neuro-steered hearing devices. While AAD integrated in a closed-loop system with real-time neurofeedback (NFB) is hypothesized to improve decoding through neural adaptation and error-correction behaviour, the short-term behavioral and algorithmic impact of such a bilateral human-machine interaction remains poorly understood. Here we evaluated the effects of NFB on AAD accuracy and user experience in a single-session AAD paradigm with online NFB involving nineteen participants. They performed a selective listening task with enforced attention switches across four conditions: open-loop (OL), closed-loop with auditory gain feedback (CLA), closed-loop with visual feedback (CLV), and a condition with pseudo-auditory gain control (psCLA) decoupled from the participants individual neural activity. AAD was performed online using both subject-specific and subject-independent linear decoders on 5 s sliding windows, followed by Hidden Markov Model post-processing. Online analysis showed comparable decoding performance across all conditions. However, offline posthoc analysis using subject-independent decoders revealed that AAD accuracy in the CLA condition was significantly lower than in the OL baseline. Subjectively, participants reported that CLA was significantly more distracting and required higher switching effort. Crucially, a causal analysis of the psCLA condition found no robust evidence that higher audio gains inherently improve decoding accuracy. Our results demonstrate that within a single-session paradigm with rapidly varying feedback cues, auditory neurofeedback may degrade AAD performance by increasing cognitive load and distraction. These findings suggest that suboptimal feedback can impede rather than facilitate learning. We conclude that more accurate and stable decoders and longitudinal, multi-session training protocols are likely essential prerequisites for achieving beneficial neurofeedback effects in closed-loop auditory attention systems.

17

The Sleep-Wake Classification Performance of Pediatric-Trained Machine Learning Algorithms for Raw Accelerometer Data

Chen, P.-W.; Cielo, C.; Walsh, O.; Mcdonald, M.; Song, P. X.; Goldstein, C.; Moreno, J. P.; Jansen, E.; Mitchell, J. A.

2026-06-01 pediatrics 10.64898/2026.05.28.26354364 medRxiv

Top 0.1%

1.9%

Show abstract

Introduction: Actigraphy sleep-wake classification methods increasingly seek to leverage raw acceleration data and machine-learning-based classification, but performance evaluation in pediatrics is limited. We trained machine-learning models using pediatric data and compared their sleep-wake classification performance with existing algorithms for children. Methods: Sixty-five children (46% female, ages 5.3 to 17.7 years) completed in-lab overnight polysomnography and wore a GENEActiv device on their non-dominant wrist. The acceleration data were converted into 30-second epochs and aligned with physician-scored sleep-wake data from electroencephalography. Seven machine-learning models were trained using leave-one-subject-out cross-validation. Epoch-by-epoch analyses generated performance metrics (e.g., balanced accuracy [BA]) and discrepancy analyses provided overall sleep duration bias estimates. The combination of highest performance and least bias was used to rank using Euclidean distance scores - where a lower score represents closer to perfect performance and zero bias. For benchmarking, we included GGIR sleep scoring algorithms and an adult trained random forest classifier. Results: Overall, 560.1 hours of polysomnography and actigraphy data were collected (74.4% of epochs were scored as sleep). The pediatric-trained local-global long-short term memory (LSTM) classifier had the most optimal epoch-by-epoch performance (e.g., BA=0.85, sensitivity=0.88, specificity=0.83, ROC-AUC=0.95, and Cohen kappa=0.67). These metrics exceeded that of an adult-trained random forest classifier and GGIR-based algorithms. Discrepancy analyses revealed that overall sleep duration was underestimated by an average of 25 minutes using the LSTM classifier with no proportional bias. Conclusion: We trained seven pediatric sleep-wake classifiers that had strong ability to detect sleep and wake, with the LSTM classifier being most optimal.

18

From Variability to Synchrony: Non-linear Development of Auditory Neural Responses During the First Year of Life

Reisenberger, E.; Schabus, M.; Florea, C.; Angerer, M.; Reimann-Ayiköz, M.; Preiss, J.; Roehm, D.; Heib, D. P. J.; Fazelnia, C.; Ameen, M. S.

2026-03-04 developmental biology 10.64898/2026.02.20.706969 medRxiv

Top 0.1%

1.7%

Show abstract

In humans, the first year of life is characterized by rapid developmental changes, including substantial brain maturation. As a result, neural responses to auditory stimuli undergo marked changes during this period. In this study, we followed 69 infants across their first year of life and recorded high-density electroencephalography (hdEEG) at 2 weeks, 6 months, and 12 months postpartum. Infants were presented with pure beep tones to examine the development of neural responses to auditory stimulation. We analysed event-related potentials (ERPs), inter-trial phase coherence (ITPC), and time-frequency (TF) responses to the beep tones and controlled for arousal state during stimulus presentation. We found that with increasing age, neural responses became more pronounced and showed reduced trial-to-trial variability. Phase synchronization increased from 2 weeks to later developmental stages in a broad low-frequency range (0 to 11 Hz), indicating improved temporal alignment of brain responses over time. However, phase synchronization decreased from 6 to 12 months, suggesting a developmental transition towards more differentiated brain activity. Taken together, these findings demonstrate that auditory maturation during the first year of life follows a non-linear trajectory driven by dynamic changes in neural synchronization, reflecting the progressive refinement of functional neural circuits. Our results thus provide a critical benchmark for understanding the neural dynamics underlying sensory development during this period. Impact StatementLongitudinal high-density EEG recordings reveal that neural responses to auditory stimuli undergo non-linear developmental changes during the first year of life, driven by dynamic shifts in neural synchronization that reflect progressive refinement of auditory neural processing.

19

Cohort profile: The Australian Children of the Digital Age (ACODA) longitudinal cohort study measuring the digital lives of Australians during early childhood

MacKenzie, J.; Johnson, D.; Sarra, G.; Matthews, J. R.; Martinez-Buelvas, L.; Trenaman, D.; Sefton-Green, J.; Howard, S. J.; Smith, S. S.; Danby, S.; Zabatiero, J.

2026-05-13 pediatrics 10.64898/2026.05.09.26352795 medRxiv

Top 0.1%

1.5%

Show abstract

ObjectivesThe Australian Children of the Digital Age (ACODA) study is a longitudinal cohort study investigating the digital lives of Australians during early childhood. This paper presents a comprehensive description of the study protocol and overview of childrens digital technology use in the home at the first wave of data collection. MethodsCaregivers of children aged 6-months to 5-years completed a survey that captured the availability and use of digital technology within the home, and child- and caregiver-related factors that may influence childrens digital technology use. ResultsA total of 3,388 caregivers from across all Australian states and territories completed the survey. Majority (98%) of children had digital technology and internet access within their homes. Most children (93%) used at least one device in the last year, with televisions, tablets, and mobile phones most frequently used (89%, 47%, 42%, respectively). Digital technology use started early, with 61% of children aged <1-year having used a television. A greater proportion of older children used devices, and for longer durations than younger children. Across all ages, daily time was longest on televisions (M = 1:20, SD = 1:14), tablets (M = 1:06, SD = 1:36), and mobile phones (M = 0:30, SD = 1:05). Digital technology was used most for entertainment and learning activities, and was used typically with a caregiver and in lounge/living rooms. ConclusionsThe ACODA study is the first longitudinal study to describe the digital technology use of Australians during early childhood and the context of this use. Data indicated that Australian children frequently used digital technology for entertainment and with their caregivers. Also, older children used digital technology more than younger children. Future waves allow for exploration of changes in childrens digital technology use over time, and associations with factors that may influence childrens digital technology use.

20

Iconic Sound-Shape Correspondences in Aphasia

Dorsi, J.; Sandberg, C.; Lacey, S.; Nygaard, L.; Sathian, K.

2026-05-19 neuroscience 10.64898/2026.05.18.725976 medRxiv

Top 0.1%

1.3%

Show abstract

PurposeTo examine speech iconicity for shape in aphasia, we compared iconicity ratings from people with aphasia to those from neurologically intact individuals and evaluated how iconicity relates to phonological and semantic processing profiles in aphasia. MethodEleven people with aphasia and 11 age- and gender-matched neurologically intact participants rated how rounded or pointed 50 auditory pseudowords sounded using a 5-point scale. Ratings from participants with aphasia were compared to predicted iconicity ratings derived from reference ratings from prior work and to ratings from neurologically intact participants. For each participant with aphasia, correlations between individual ratings and predicted ratings were related to measures of phonological and semantic processing. ResultsRatings from people with aphasia were significantly correlated with both the predicted ratings and the ratings from neurologically intact participants. The strength of the correlation between individual ratings and predicted ratings did not differ significantly between groups, although there was a trend toward weaker correlations in the aphasia group. There were indications that greater language impairment was associated with greater disruption of iconicity ratings; in particular, deficits in phonological segmentation and semantic processing were associated with reduced sensitivity to shape iconicity. ConclusionThese findings suggest that sensitivity to shape iconicity is preserved in individuals with aphasia to varying degrees. The specific nature of language impairment appears to play an important role in determining iconicity processing in aphasia.